Blind watermarking provides powerful evidence for copyright protection, image authentication, and tampering identification. However, it remains a challenge to design a watermarking model with high imperceptibility and robustness against strong noise attacks. To resolve this issue, we present a framework Combining the Invertible and Non-invertible (CIN) mechanisms. The CIN is composed of the invertible part to achieve high imperceptibility and the non-invertible part to strengthen the robustness against strong noise attacks. For the invertible part, we develop a diffusion and extraction module (DEM) and a fusion and split module (FSM) to embed and extract watermarks symmetrically in an invertible way. For the non-invertible part, we introduce a non-invertible attention-based module (NIAM) and the noise-specific selection module (NSM) to solve the asymmetric extraction under a strong noise attack. Extensive experiments demonstrate that our framework outperforms the current state-of-the-art methods of imperceptibility and robustness significantly. Our framework can achieve an average of 99.99% accuracy and 67.66 dB PSNR under noise-free conditions, while 96.64% and 39.28 dB combined strong noise attacks. The code will be available in https://github.com/rmpku/CIN.
translated by 谷歌翻译
Learning generalizable insertion skills in a data-efficient manner has long been a challenge in the robot learning community. While the current state-of-the-art methods with reinforcement learning (RL) show promising performance in acquiring manipulation skills, the algorithms are data-hungry and hard to generalize. To overcome the issues, in this paper we present Prim-LAfD, a simple yet effective framework to learn and adapt primitive-based insertion skills from demonstrations. Prim-LAfD utilizes black-box function optimization to learn and adapt the primitive parameters leveraging prior experiences. Human demonstrations are modeled as dense rewards guiding parameter learning. We validate the effectiveness of the proposed method on eight peg-hole and connector-socket insertion tasks. The experimental results show that our proposed framework takes less than one hour to acquire the insertion skills and as few as fifteen minutes to adapt to an unseen insertion task on a physical robot.
translated by 谷歌翻译
When humans perform contact-rich manipulation tasks, customized tools are often necessary and play an important role in simplifying the task. For instance, in our daily life, we use various utensils for handling food, such as knives, forks and spoons. Similarly, customized tools for robots may enable them to more easily perform a variety of tasks. Here, we present an end-to-end framework to automatically learn tool morphology for contact-rich manipulation tasks by leveraging differentiable physics simulators. Previous work approached this problem by introducing manually constructed priors that required detailed specification of object 3D model, grasp pose and task description to facilitate the search or optimization. In our approach, we instead only need to define the objective with respect to the task performance and enable learning a robust morphology by randomizing the task variations. The optimization is made tractable by casting this as a continual learning problem. We demonstrate the effectiveness of our method for designing new tools in several scenarios such as winding ropes, flipping a box and pushing peas onto a scoop in simulation. We also validate that the shapes discovered by our method help real robots succeed in these scenarios.
translated by 谷歌翻译
最近结束语音合成的最新进步使得能够产生高度自然的语音。然而,训练这些模型通常需要大量的高保真语音数据,并且对于看不见的文本,合成语音的韵律相对不自然。为了解决这些问题,我们建议将基于精细的BERT基前端与基于预先训练的FastSeech2的声学模型结合起来,以改善韵律建模。在多任务学习中,预训练的伯爵在多电话消歧任务中,联合中文词组分割任务,联合中文字分割(CWS)和演讲(POS)标记任务,以及在多任务学习中的韵律结构预测(PSP)任务框架。FastSeech 2在大规模的外部数据上预先培训,这些数据很少,但更容易获得。实验结果表明,微调BERT模型和预训练的禁止轴2可以改善韵律,特别是对于那些结构复杂的句子。
translated by 谷歌翻译
本文回顾了关于压缩视频质量增强质量的第一个NTIRE挑战,重点是拟议的方法和结果。在此挑战中,采用了新的大型不同视频(LDV)数据集。挑战有三个曲目。Track 1和2的目标是增强HEVC在固定QP上压缩的视频,而Track 3旨在增强X265压缩的视频,以固定的位速率压缩。此外,轨道1和3的质量提高了提高保真度(PSNR)的目标,以及提高感知质量的2个目标。这三个曲目完全吸引了482个注册。在测试阶段,分别提交了12个团队,8支球队和11支球队,分别提交了轨道1、2和3的最终结果。拟议的方法和解决方案衡量视频质量增强的最先进。挑战的首页:https://github.com/renyang-home/ntire21_venh
translated by 谷歌翻译
随着处理点云数据中深度学习的繁荣,最近的作品表明,后门攻击对3D视觉应用构成了严重的安全威胁。攻击者通过用触发器中毒一些训练样本将后门注射到3D模型中,从而使后门模型在干净的样品上表现良好,但在出现扳机模式时会恶意行为。现有的攻击通常将一些附加点插入点云中,或使用线性转换(例如旋转)来构建中毒点云。但是,这些中毒样品的影响可能会被某些常用的3D点云的常用预处理技术削弱,甚至可以消除,例如,离群的去除或旋转增强。在本文中,我们提出了一种新颖的觉得不可察觉,强大的后门攻击(IRBA)来应对这一挑战。我们利用一种称为加权局部变换(WLT)的非线性和局部变换来构建具有独特转换的中毒样品。由于WLT中有几种超参数和随机性,因此很难产生两个类似的转换。因此,具有独特转化的中毒样品可能对上述预处理技术有抵抗力。此外,由于由固定的WLT引起的失真的可控性和平滑度,因此生成的中毒样品也无法察觉到人类检查。在三个基准数据集和四个模型上进行的广泛实验表明,即使使用预处理技术,IRBA在大多数情况下都可以达到80%+ ASR,这显着高于以前的最新攻击。
translated by 谷歌翻译
本文旨在为多尺度帧卷积提供一种新颖的光谱图神经网络设计。在光谱范例中,光谱GNN通过提出频谱域中的各种光谱滤波器来提高图形学习任务性能,以捕获全局和本地图形结构信息。虽然现有的光谱方法在某些图表中显示出卓越的性能,但是当图表信息不完整或扰乱时,它们患有缺乏灵活性并脆弱。我们的新帧卷曲卷积包括直接在光谱域中设计的过滤功能,以克服这些限制。所提出的卷积在切断光谱信息中表现出具有很大的灵活性,并有效地减轻了噪声曲线图信号的负效应。此外,为了利用现实世界图数据中的异质性,具有我们新的帧卷积的异构图形神经网络提供了一种用于将元路径的内在拓扑信息与多级图分析嵌入的解决方案。进行了扩展实验实现了具有嘈杂节点特征和卓越性能结果的设置下的现实异构图和均匀图。
translated by 谷歌翻译
平面设计在人们的日常生活中普遍存在。对于图形设计,最耗时的任务是在接口中铺设各种组件。重复的手动布局设计将浪费大量的专业图形设计师。现有模板通常是基本的,不适合大多数设计,降低效率和限制创造力。本文实现了变压器模型和条件变形Autiachoder(CVAE)到图形设计布局生成任务。它提出了一个名为layoutt-cvae的端到端图形设计布局生成模型。我们还提出了基于元素解剖和特征的解剖策略,并将新的图形设计原则和相似性指标引入了模型中,这显着提高了深度模型的可控性和可解释性。与现有的最先进模型相比,我们由我们生成的布局在许多指标上表现更好。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译